Morphological and Syntactic Processing for Text Retrieval
نویسندگان
چکیده
This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focuses on Spanish, this approach is extensible to other languages by simply adapting the grammar used by the parser.
منابع مشابه
On the Usefulness of Extracting Syntactic Dependencies for Text Indexing
In recent years, there has been a considerable amount of interest in using Natural Language Processing in Information Retrieval research, with specific implementations varying from the word-level morphological analysis to syntactic parsing to conceptual-level semantic analysis. In particular, different degrees of phrase-level syntactic information have been incorporated in information retrieval...
متن کاملCombining Part of Speech Induction and Morphological Induction
Linguistic information is useful in natural language processing, information retrieval and a multitude of sub-tasks involving language analysis. Two types of linguistic information in all languages are part of speech and morphology. Part of speech information reflects syntactic structure and can assist in tasks such as speech recognition, machine translation and word sense disambiguation. Morph...
متن کاملMorpho-syntactic tagging system based on the patterns words for arabic texts
Text tagging is a very important tool for various applications in natural language processing, namely the morphological and syntactic analysis of texts, indexation and information retrieval, "vocalization" of Arabic texts, and probabilistic language model (n-class model). However, these systems based on the lexemes of limited size, are unable to treat unknown words consequently. To overcome thi...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملExploitation de la morphologie pour l'extraction automatique de paraphrases grand public des termes médicaux
The medical area conveys very specific terms (p. ex., blepharospasm, appendicectomy), which are difficult to understand by people without medical training. We propose an automatic method for the acquisition of paraphrases, which we expect to be easier to understand than the original terms. The method is based on the morphological analysis of terms, syntactic analysis of texts, and text mining o...
متن کامل